Regular Expressions with Numerical Occurrence Indicators - preliminary results

نویسندگان

  • Pekka Kilpeläinen
  • Rauno Tuhkanen
چکیده

Regular expressions with numerical occurrence indicators (#REs) are used in established text manipulation tools like Perl and Unix egrep, and in the recent W3C XML Schema Definition Language. Numerical occurrence indicators do not increase the expressive power of regular expressions, but they do increase the succinctness of expressions by an exponential factor. Therefore methods based on straightforward translation of #REs into corresponding standard regular expressions are computationally infeasible in the general case. We report some preliminary results about computational problems related to efficient matching and comparison of #REs. Matching, or membership testing of languages described by #REs, is shown to be tractable. Simple comparison problems (inclusion and overlap) of #REs are shown to be NP-hard. We also consider simple #REs consisting of a single symbol and nested numerical occurrence indicators only, and derive a simple numerical test for the membership of a word in the language described by a simple #RE.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

One-unambiguity of regular expressions with numeric occurrence indicators

Regular expressions with numeric occurrence indicators are an extension of traditional regular expressions, which let the required minimum and the allowed maximum number of iterations of subexpressions be described with numeric parameters. We consider the problem of testing whether a given regular expression E with numeric occurrence indicators is 1-unambiguous or not. This condition means, inf...

متن کامل

Optimizing Schema Languages for XML: Numerical Constraints and Interleaving

The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and non-emptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of ...

متن کامل

Inclusion of Unambiguous RE#s is NP-Hard

We show that testing inclusion between languages represented by regular expressions with numerical occurrence indicators (#REs) is NP-hard, even if the expressions satisfy the requirement of “unambiguity”, which is required for XML Schema content model expressions. 1 Proof of the result We have seen before [3] that testing for inclusion and overlap of languages represented by #REs is NP-hard. T...

متن کامل

The Membership Problem for Regular Expressions with Unordered Concatenation and Numerical Constraints

We study the membership problem for regular expressions extended with operators for unordered concatenation and numerical constraints. The unordered concatenation of a set of regular expressions denotes all sequences consisting of exactly one word denoted by each of the expressions. Numerical constraints are an extension of regular expressions used in many applications, e.g. text search (e.g., ...

متن کامل

Shorter Regular Expressions from Finite-State Automata

We consider the use of state elimination to construct shorter regular expressions from finite-state automata. Although state elimination is an intuitive method for computing regular expressions from finitestate automata, the resulting regular expressions are often very long and complicated. We examine the minimization of finite-state automata to obtain shorter expressions first. Then, we introd...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003